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THE  PROBLEM 

Voice  command  and  control  systems  have  been  proposed  as  a 
potential  means  of  off-loading  the  typically  overburdened  visual 
information  processing  system.  However,  prior  to  the  introduction 
of  novel  human-machine  interfacing  technologies  in  high  workload 
environments,  consideration  must  be  given  to  the  integration  of 
the  new  technologies  within  existing  task  structures  to  ensure 
that  no  new  sources  of  workload  or  interference  are 
systematically  introduced.  This  study  examined  the  use  of  voice 
interactive  systems  technology  in  the  joint  performance  of  two 
cognitive  information  processing  tasks  requiring  continuous 
memory  and  choice  reaction  wherein  a  basis  for  intertask 
interference  might  be  expected.  stimuli  for  the  continuous 
memory  task  were  presented  aurally  and  either  voice  or  keyboard 
responding  was  required  in  the  choice  reaction  task.  The  effects 
of  intertask  stimulus  similarity  on  multitask  performance  were 
also  examined.  "r\  .  ,  „ 

FINDINGS 

performance  was  significantly  degraded  in  each  task  when 
voice  responding  was  required  in  the  choice  reaction  time  task. 
Performance  degradation  was  evident  in  higher  error  scores  for 
both  the  choice  reaction  and  continuous  memory  tasks.  Performance 
decrements  observed  under  conditions  of  high  intertask  stimulus 
similarity  were  not  statistically  significant. 

RECOMMENDATIONS 

The  results  signal  the  need  to  consider  further  the  task 
requirements  for  verbal  short-term  memory  when  applying  speech 
techeology  in  multitask  environments.  Research  should  be  directed 
toward  identifying  other  potential  sources  of  intertask 
interference  with  information  processing  to  assist  system  task 
integration,  function  allocation,  and.  the  introduction  of  novel 
human-machine  interfacing  techniques  in  high  workload,  multitask 
environments . 
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INTRODUCTION 


Modern,  high  speed  aviation  weapon  systems  impose  complex 
information  processing  and  workload  demands  on  operators. 
Demands  on  the  human  visual-manual  input-output  system  are 
especially  extreme  and  may  compromise  operational  system 
effectiveness  through  operator  induced  errors.  Voice  command  and 
control  systems  have  been  proposed  as  a  potential  means  of  off¬ 
loading  the  typically  overburdened  visual  information  processing 
system.  However,  prior  to  the  introduction  of  novel  human-machine 
interfacing  technologies  in  high  workload  environments, 
consideration  must  be  given  to  the  integration  of  the  new 
technologies  within  existing  task  structures  to  ensure  that  no 
new  sources  of  workload  or  interference  are  systematically 
introduced.  Because  of  the  complex  psychomotor  and  cognitive 
requirements  of  the  high  workload  environment  of  interest,  viz., 
the  cockpit,  the  integration  of  verbal  and  manual  control  tasks 
is  of  particular  concern. 

In  a  review  of  the  literature  on  the  concurrent  performance  of 
verbal  tasks  and  manual  tracking  tasks,  Harris  (1)  noted  the  iack 
of  an  adequate  data  base  to  support  the  integration  of  voice 
technology  within  multitask  environments.  Harris  recommended  a 
comprehensive,  systematic  research  program  to  identify 
performance  capabilities  and  limitations  in  concurrent  verbal  and 
manual  control  tasks.  In  a  general  sense,  the  concern  is  whether 
new  human-machine  interfacing  techniques  can  faciltate  system 
performance  by  utilizing  relatively  unused  input  and  output 
channels,  such  as  hearing  and  speech.  Although  no  comprehensive 
data  base  exists,  several  studies  (3,  4,  5)  have  provided  support 
for  the  notion  that  the  auditory  input  and  speech  output  channels 
can  serve  effectively  as  additional,  parallel  means  of 
information  handling  when  the  visual  input  and  manual  output 
channels  are  occupied  with  the  processing  of  spatial  information. 
However,  other  studies  (2)  have  failed  to  provide  support  for 
that  claim.  Resolution  of  these  discrepancies  may  result  from  a 
better  understanding  of  the  strategies  that  subjects  employ  in 
multitask  situations  (Damos,  Note  1),  of  the  relationship  between 
the  information  processing  requirements  of  the  constituent  tasks 
in  the  multitask  environment  (6),  and  of  the  unique  processing 
requirements  of  each  task  considered  separately. 

A  study  by  Harris,  Owens,  and  North  (2)  addressed  the  latter 
point.  The  authors  employed  a  multitask  situation  consisting  of 
the  concurrent  performance  of  a  manual  tracking  task  and  a  digit 
processing  task  that  required  the  continuous  use  of  short-term 
memory.  When  keyboard  responding  was  required  in  the  latter  task, 
auditory  presentation  of  stimuli  resulted  in  performance  which 
was  superior  to  that  obtained  with  visual  presentation  of 
stimuli.  When  voice  responding  was  required,  visual  presentation 
was  superior  to  that  obtained  with  auditory  presentation.  The 
authors  hypothesized  that  this  stimulus-response  mode  interaction 
resided  in  the  peculiar  information  processing  requirements  of 
the  digit  processing  task,  viz.,  that  the  requisite  rehearsal  and 


retrieval  processes  were  more  susceptible  to  disruption  by 
intervening  vocal  responses  than  by  manual  responses. 

The  purpose  of  the  present  investigation  was,  in  a  general 
sense,  to  explore  sources  of  potential  interference  when  voice 
input/output  (I/O)  systems  are  employed  in  high  workload 
environments.  The  specific  purpose  of  the  study  was  to  examine 
further  the  effects  of  stimulus  and  response  characteristics  on 
memorial  processes  in  multitask  settings.  Two  cognitive  tasks,  a 
continuous  memory  digit  processing  task  and  a  choice  reaction 
time  (CRT)  task,  were  employed  to  represent  typical  kinds  of 
information  processing  required  of  operators  in  high  workload 
environments.  To  determine  if  the  response  requirements  of  the 
CRT  task  interfered  with  information  processing  in  the  digit 
processing  task,  stimuli  in  the  latter  task  were  always  presented 
aurally,  while  the  CRT  task  required  voice  responses  in  one 
condition,  and  keyboard  responses  in  another.  In  addition, 
intertask  stimulus  similarity  was  manipulated  by  using  visually 
presented  digits  in  one  CRT  task  condition,  and  colored  lights  in 
the  other.  Specifically,  it  was  hypothesized  that  due  to  the 
auditory  properties  of  human  short-term  memory  (e.g., 
subvocalization  in  rehearsal)  and  the  interference  effects  of 
highly  similar  items,  performance  on  the  tasks  would  be  most 
disrupted  when  the  CRT  task  required  voice  responses  and  digit 
processing . 

METHOD 


SUBJECTS 

Twenty-four,  right-handed,  male  student  naval  avaiators 
between  the  ages  of  22  and  2S  years  participated  as  subjects  in 
the  experiment. 

EXPERIMENTAL  DESIGN 

Subjects  were  tested  in  single  and  dual-task  performance  of 
both  a  continuous  memory  digit-processing  task  and  a  four- 
alternative,  visual  choice  reaction  time  task.  Stimuli  for  the 
digit  processing  task  were  presented  aurally  and  required  button- 
press  responses  in  all  experimental  conditions.  intertask  item 
similarity  was  a  between  subjects  variable,  (color  versus 
digits) ,  while  the  response  mechanism  required  in  the  choice 
reaction  time  task  was  a  within  subjects  variable  (voice  versus 
keyboard) . 

The  nine  experimental  conditions  described  in  Table  I  were 
used  to  form  the  test  orders  listed  in  Table  II. 


TABLE  I 


Experimental  Conditions 


IKS  single- task,  digit  processing,  keyboard  response 
1KD  single-task,  choice  reaction,  digits,  keyboard  response 
1VD  single-task,  choice  reaction,  digits,  vocal  response 
1KC  single-task,  choice  reaction,  colors,  keyboard  response 
1VC  single- task,  choice  reaction,  colors,  vocal  response 
2KD  dual-task,  digit  choice  reaction  stimuli,  keyboard  response 

2VD  dual-task,  digit  choice  reaction  stimuli,  vocal  response 

2KC  dual-task,  color  choice  reaction  stimuli,  keyboard  response 

2VC  dual-task,  color  choice  reaction  stimuli,  vocal  response 


TABLE  II 
Test  Orders 


Order 

#1 

IKS 

1KD 

2KD 

1VD 

2VD 

Order 

#2 

IKS 

1VD 

2VD 

1KD 

2KD 

Order 

#3 

IKS 

1KC 

2KC 

1VC 

2  VC 

Order 

#4 

IKS 

1VC 

2VC 

1KC 

2KC 

Six  subjects  were  randomly  assigned  to  each  order  to  counter 
balance  response  mechanism  within  each  of  the  two  intertask  item 
similarity  conditions.  At  the  start  of  the  experimental  session, 
the  speech  recognition  device  was  trained  to  each  subject's 
voice.  Training  consisted  of  having  the  subject  repeat  each 
possible  reponse  in  the  choice  reaction  task  ("one",  "two", 
"three"  and,  "four"  for  half  the  subjects;  "red",  "yellow", 
"blue"  ,and  "white"  for  the  other  half)  ten  times  with  the  speech 
recognition  device  in  its  training  mode.  Single  task  conditions 
consisted  of  fifty  stimulus  presentations;  dual  task  conditions 
consisted  of  100  stimulus  presentations,  50  for  each  task. 
Conditions  were  separated  by  5-min  rest  periods. 


APPARATUS 


The  subject  v?as  seated  at  a  performance  console  in  a  sound 
attenuated  booth  which  was  separated  from  the  experimenter's 
control  station.  Stimulus  sequences  were  controlled  by  a  Data 
General.  Corporation  Nova  800  minicomputer  with  32k  x  16  memory.  A 
custom-built  interface  received  and  decoded  switch  closures  from 
the  keyboard  used  by  the  subject  and  transmitted  codes  to  the 
Nova  computer . 

Voice  recognition  of  subject  responses  and  voice  synthesis 
of  the  absolute  difference  task  stimuli  were  performed  by  a  Scope 
Electronics  Voice  Data  Entry  Terminal  System  (VDETS )  which 
consisted  of  a  Data  General  Corporation  Nova  2/10  minicomputer 
with  16k  X  16  core  memory,  a  Scope  user's  station,  a  voice 
synthesizer,  and  an  ASR-33  teletype.  The  Nova  2/10  was  hosted  by 
the  Nova  800.  The  Scope  user's  station  converted  voice  analog 
signals  from  a  microphone  mounted  on  the  subject's  headset  to 
digital  format  for  entry  into  the  Nova  2/10.  A  Vocal  Interface 
Division  Model  VS-6  VOTRAX  voice  synthesis  unit  provided  auditory 
output  signals  to  the  subject's  headset  in  the  testing  booth.  The 
teletype  was  used  by  the  experimenter  to  control  and  monitor  the 
VDETS  utterance  recognition  performance. 

The  keyboard  for  the  digit  processing  task  was  positioned  on 
the  left  side  of  the  console  and  arranged  in  a  single  horizontal 
row  of  four  buttons  labeled  "1",  "2",  "3",  and  "4".  The 
precontact  travel  of  the  microswitches  was  approximately  1mm. 

The  keybo-rd  for  the  visual  four-alternative  CRT  task  was 
positioned  on  the  right  side  of  the  console  and  was  arranged  in  a 
horizontal  row  of  four  microswitches.  From  left  to  right  the 
labels  "red",  "yellow",  "blue",  and  "white"  were  positioned  above 
the  switches.  In  addition,  the  labels  "1",  "2",  "3",  and  "4" 
were  positioned  from  left  to  right  below  the  switches. 

The  stimuli  for  the  CRT  task  were  presented  via  an  IEE, 
one-plane  readout  which  was  located  at  a  point  20  degrees  of 
visual  angle  below  the  subjects  eye  level  and  directly  above  the 
choice  reaction  task  keyboard.  The  projection  surface  of  the 
readout  was  illuminated  under  computer-  control  wi'ch  either  a  red, 
yellow,  blue  or  white  light  in  one  condition,  or  with  the 
projected  image  of  the  numeral  1,  2,  3,  or  4  in  the  other. 

PROCEDURE 

Single  Task  Digit  Processing .  In  this  self-paced  task  the 
subject  was  required  to  compute  the  absolute  difference  between 
two  sucessive  digits  presented  in  a  pseudo-random  sequence. 
Stimulus  digits  varied  between  zero  and  nine.  As  soon  as  the 
subject  responded  with  the  absolute  value  of  the  difference 
between  the  current  digit  and  the  previous  digit  in  the  sequence, 
a  new  digit  was  presented.  An  example  of  a  typical  presentation 
sequence  and  associated  responses  is  given  below: 


stimulus  sequence:  7-4-8-6-3-1-0 . 

subject  responses:  3-4-2-3-2-1-,, . . . 

Stimulus  presentation  was  arranged  such  that  only  the  digits  one 
through  four  were  possible  correct  responses.  In  the  event  the 
subject  forgot  the  previous  stimulus  digit,  he  could  request  that 
it  be  repeated  by  saying  "again",  whereupon  the  stimulus  was 
repeated,  if  the  recognition  system  failed  to  understand  the 
subjects  response,  the  subject  was  notified  through  the  VOTRAX 
unit  with  the  phrase  "say  again",  whereupon  subjects  repeated 
their  response.  Response  times  on  correct  trials  and  number  of 
errors  were  recorded  for  each  of  50  test  trials. 

Single  task  choice  reaction  time  task.  In  this  experimenter 
paced  task,  the  subject  was  required  to  respond  to  visually 
presented  stimuli  (colors  or  numerals) .  After  a  variable 
foreperiod  of  either  0.5,  1.0,  or  1.5  seconds,  the  stimulus  was 
turned  on  and  remained  on  until  the  subject  responded  or  350  msec 
had  elapsed,  if  the  subject  failed  to  responsd  within  2000  msec 
or  made  an  inaccurate  response  an  error  was  recorded.  The 
stimuli,  the  numerals  1,2,3  and  4  in  one  condition,  and  the 
colors  red,  yellow,  blue,  and  white  in  the  other,  were  presented 
in  a  pseudo-  random  order.  Response  times  on  correct  trials  and 
number  of  errors  were  recorded  for  each  of  50  test  trials. 

Dual  task  condition.  Following  a  variable  foreperiod  as 
above,  the  choice  reaction  stimulus  was  turned  on  for  350  msec. 
As  soon  as  the  subject  responded,  or  2000  msec  had  elapsed,  the 
digit  processing  task  stimulus  was  presented  by  the  VOTRAX.  As 
soon  as  the  subject  responded  to  the  digit  processing  task 
stimulus  the  next  choice  reaction  stimulus  was  presented.  After 
the  first  trial  there  was  no  variable  foreperiod  in  the  CRT  task; 
the  onset  of  choice  reaction  task  stimuli  immediately  followed  a 
response  to  the  digit  processing  task.  The  sequence  was  repeated 
for  a  total  of  100  stimulus  presentations,  50  for  each  task, 
during  each  dual-task  session. 


RESULTS 

Single  task  trials  were  regarded  as  practice  and  were  not 
considered  in  the  following  analyses.  Total  number  of  errors  and 
correct  response  latencies  were  averaged  across  subjects  within 
cells  and  examined  separately  for  each  task  performed  under  dual 
task  conditions.  Split-plot  two-way  analyses  of  variance  were 
used  throughout.  For  CRT  task  performance,  neither  the  main 
effect  for  intertask  stimulus  similarity,  nor  the  interaction 
between  stimulus  similarity  and  response  mode,  were  significant 
for  errors  (F (1 , 22) =1 . 904 ,  p  >.05  and  F(l,22)=1.32,  p  >.05),  or 
for  correct  reponse  latency  (F (1 , 22 )  =0 . 112 ,  p  ->.05  and 
F(l,22)=3.052,  p  >.05).  However,  the  main  effect  of  response  mode 
was  significant  for  both  number  of  errors,  F(l,22)  =11.88,  p  <.01 
and  correct  response  latency,  F(l,22)=21.483,  p  <  .01.  From 
Figures  1  and  2,  it  can  be  seen  that  errors  and  response 


MEAN  NUMBER  OF  ERRORS 


latencies  were  greater  in  the  CRT  task  when  voice  responding  was 
required.  An  important  qualification  is  discussed  below  in 
relation  to  the  latency  data  obtained  in  the  voice  response  mode. 


j  Figure  1.  Mean  number  errors  in  the  choice 

reaction  task  as  a  function  of 
response  modality. 


1 

L„ 


\H  v  \\V V«V-' .NV- V • 

‘v  •“‘O  vN^O  y  «  W‘«*  .  «  .  •  i-o, .  *•*  \>  \ 

,  * , '  -N  * •>  v  ->  *!»  •  o  •  -  -  • .  'v 


’  ■  *  V  V  V  vv„  -\ 

■  lT  Ml V  it  li* 


sy. 


*i 


v4.tf'¥W 


COLORS 


/  DIGITS 


sit 


KEYS 


VOICE 


Figure  2.  Average  correct  response  latency 
in  the  choice  reaction  task  as  a 
function  of  response  modality. 


For  continuous  memory  digit  processing  task  performance, 
neither  the  main  effect  for  stimulus  similarity  nor  the 
interaction  between  stimulus  similarity  and  response  mode,  were 
significant  j jfor  errors  (F<1,22)  =  1.627,p  >.05  and  F(l,22)  = 
2.433,  p  >.;{i5),  or  for  correct  response  latency  (F(l,22)=  0.115, 
p  >.05  and  F(l,22)  =  0.001,  p  >.05).  In  addition,  the  main  effect 
of  response  mode  was  not  significant  for  correct  response  latency 
(F(l,22)  =  0.176,  p  >.05).  The  response  mode  main  effect  was 
significant  for  number  of  errors,  however  (F(l,22)  =  4.525,  p 
<.01).  As  shown  in  Figure  3,  more  errors  occurred  in  the  digit 
processing  task  under  conditions  requiring  voice  responding. 
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Figure  3.  Mean  number  errors  in  continuous 
memory  digit  processing  task  as  a 
function  of  response  modality. 


DISCUSSION 

The  results  indicate  that  performance  on  each  of  the  tasks 
was  degraded  when  voice  responding  as  opposed  to  keyboard 
responding  was  required  in  the  CRT  task.  Performance  degradation 
was  evident  in  higher  error  scores  for  both  the  CRT  and  digit 
processing  tasks.  Although  the  results  revealed  that  keyboard 


responding  was  faster  than  voice  responding  in  the  CRT  task,  this 
finding  must  be  interpreted  with  due  consideration  of  the 
processing  characteristics  of  the  VDETS.  The  latency  data  for 
the  CRT  task  represented  the  elapsed  time  from  the  onset  of  the 
stimulus  to  the  completion  of  a  correct  response.  In  the  voice 
response  mode  this  time  included,  on  the  average,  slightly  over 
500  msec  required  by  the  VDETS  to  accomplish  utterance 
recognition.  Thus,  from  the  mean  latency  data  shown  in  Figure  2, 
it  can  be  seen  that  voice  responses  were  actually  initiated 
faster  than  keyboard  responses.  For  present  purposes,  the 
latency  data  from  the  CRT  task  should  not  be  interpreted  as 
providing  clear  evidence  of  the  superiority  of  either  the  voice 
or  keyboard  response  modes. 

Although  intertask  item  similarity  produced  no  statistically 
significant  effects,  differences  in  performance  when  digits  as 
opposed  to  colors  served  as  stimuli  in  the  CRT  task  were  in  the 
predicted  direction.  More  errors  occurred,  especially  in  the 
digit  processing  task,  when  digits  were  presented  in  the  CRT 
task.  The  present  investigation  clearly  did  not  represent  a 
definitive  attempt  to  explore  fully  the  effects  of  intertask 
stimulus  similarity  on  multitask  performance.  Rather,  this  study 
sought  to  highlight  the  effects  that  subtle,  often  overlooked 
task  variables  can  have  on  complex  task  performance  and  the  need 
for  designers  and  test  and  evaluation  personnel  to  consider  such 
factors  in  evaluating  new  technology  that  will  ostensibly  enhance 
human  performance.  Parametric  evaluations  are  needed  to  assess 
intertask  stimulus  similarity  and  other  characteristics  of  task 
structures  that  can  differentially  interfere  with  information 
processing  in  multitask  situations,  particularly  those  involving 
short-term  memory  requirements. 

Overall  the  data  suggest  that  the  acoustical  attributes  of 
the  stimuli  and  reponses  in  jointly  performed  tasks  can  give  rise 
to  intertask  interference,  especially  when  one  or  more  of  the 
tasks  require  rehearsal  and  retrieval  from  short-term  memory.  As 
inferred  from  previous  work  (2),  it  seems  most  likely  that 
rehearsal  and  retrieval  processes  active  during  continuous  memory 
task  performance  were  more  susceptible  to  disruption  by 
intervening  vocal  responses  than  by  manual  responses. 
Specifically,  these  results  signal  the  need  to  consider  further 
the  role  of  verbal  short-term  memory  in  applications  of  speech 
technology.  In  general,  though,  research  concerning  the  efficacy 
of  voice  I/O  for  command  and  control  operations  should  be 
directed  toward  identifying  other  potential  sources  of  intertask 
interference  with  human  information  processing. 

The  implications  of  the  present  results  for  system  designs 
that  contemplate  the  use  of  voice  interactive  systems  technology 
are  seemingly  straightforward.  It  is  not  simply  a  matter  of 
determining  if  a  function  can  be  performed  using  voice  I/O,  but 
rather  how  well  the  function  can  be  performed  in  the  context  of 
the  total  task  ensemble.  The  possibility  certainly  exists  that 
additional  requirements  for  voice  I/O  could  serve  to  deprecate 
overall  performance  in  some  competing  task  demand  situations. 
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Sucn  would  be  the  case,  foi.^  instance,  during  high  workload, 
transition  phases  of  flight  in  which  verbal  communication 
requirements  are  often  greatly  increased.  Research  that  further 
delineates  the  loci  and  extent  of  interference  effects  of  voice 
I/O  in  multitask  situations  should  provide  results  very  useful  to 
system  task  integration,  function  allocation,  and  the 
introduction  of  novel  human-machine  interfacing  techniques  in 
high  workload  environments. 
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